Characterizing cancer and COVID-19 outcomes using electronic health records

Purpose Patients with cancer often have compromised immune system which can lead to worse COVID-19 outcomes. The purpose of this study is to assess the association between COVID-19 outcomes and existing cancer-specific characteristics. Patients and methods Patients aged 18 or older with laboratory-confirmed COVID-19 between June 1, 2020, and December 31, 2020, were identified (n = 314 004) from the Optum® de-identified COVID-19 Electronic Health Record (EHR) derived from more than 700 hospitals and 7000 clinics in the United States. To allow sufficient observational time, patients with less than one year of medical history in the EHR dataset before their COVID-19 tests were excluded (n = 42 365). Assessed COVID-19 outcomes including all-cause 30-day mortality, hospitalization, ICU admission, and ventilator use, which were compared using relative risks (RRs) according to cancer status and treatments. Results Among 271 639 patients with COVID-19, 18 460 had at least one cancer diagnosis: 8034 with a history of cancer and 10 426 with newly diagnosed cancer within one year of COVID-19 infection. Patients with a cancer diagnosis were older and more likely to be male, white, Medicare beneficiaries, and have higher prevalences of chronic conditions. Cancer patients had higher risks for 30-day mortality (RR 1.07, 95% CI 1.01–1.14, P = 0.028) and hospitalization (RR 1.04, 95% CI 1.01–1.07, P = 0.006) but without significant differences in ICU admission and ventilator use compared to non-cancer patients. Recent cancer diagnoses were associated with higher risks for worse COVID-19 outcomes (RR for mortality 1.17, 95% CI 1.08–1.25, P<0.001 and RR for hospitalization 1.10, 95% CI 1.06–1.14, P<0.001), particularly among recent metastatic (stage IV), hematological, liver and lung cancers compared with the non-cancer group. Among COVID-19 patients with recent cancer diagnosis, mortality was associated with chemotherapy or radiation treatments within 3 months before COVID-19. Age, black patients, Medicare recipients, South geographic region, cardiovascular, diabetes, liver, and renal diseases were also associated with increased mortality. Conclusions and relevance Individuals with cancer had higher risks for 30-day mortality and hospitalization after SARS-CoV-2 infection compared to patients without cancer. More specifically, patients with a cancer diagnosis within 1 year and those receiving active treatment were more vulnerable to worse COVID-19 outcomes.


Introduction
Coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disproportionately affects individuals with underlying medical conditions [1,2]. Cancer has been considered a risk factor for severe COVID-19 outcomes [3]. Because of a compromised immune system due to cancer or cancer treatment, patients with cancer are generally more susceptible to infectious agents which may lead to increased morbidity and mortality risks from COVID-19. Earlier, generally smaller, studies reported that COVID-19 patients with cancer were at a higher risk for severe complications or death compared to those without cancer, especially individuals with lung cancers, hematological cancers, metastatic cancer, or recent cancer treatment [4][5][6][7]. However, more recent findings have been somewhat inconsistent. Some multicenter studies with a larger sample size of cancer patients, with matched or comparable non-cancer patients, confirmed that patients with cancer were at increased risk for COVID-19 infection and worse outcomes [8][9][10], while other studies claimed that cancer diagnosis or treatment was not associated with the outcomes [11][12][13][14][15]. Recent studies showed that the rates of severe illness from COVID-19 were comparable between those with and without cancer [13,14], and worse outcomes were driven by pre-existing conditions or initial COVID-19 severity, but not cancer characteristics [11,12,15].
Reported factors associated with increased mortality from COVID-19 in patients with cancer were similar to those without cancer, which include age, male sex, smoking, and the number of comorbidities [4][5][6][7]10]. Studies have also identified racial disparities related to COVID-19 outcomes in cancer patients. For example, non-Hispanic black and Hispanic patients were at higher risk for poorer COVID-19 outcomes in cancer patients [10,16,17].
Despite the accumulation of real-world data and evidence, the effect of cancer on COVID-19 outcomes has not been fully characterized. The impact of factors such as cancer treatment and social determinant remains to be elucidated. However, determining the independent contributing effect of cancer on COVID-19 outcomes is challenging because cancer and COVID-19 have many shared risk factors including older age and other comorbidities such as obesity. The goal of this study is to leverage the large-scale, Optum1 de-identified COVID-19 Electronic Health Record (EHR) data set derived from more than 700 hospitals and 7000 clinics in the United States to systematically compare COVID-19 outcomes between patients with and without cancer. We are interested in studying questions such as whether recent diagnosis (within 1 year), active treatment, type of cancer, and other factors are associated with 30day mortality among COVID-19 patients with cancer after adjusting for other comorbidities. This large set of available cancer-related EHR data has also allowed us to perform subgroup analyses by comparing COVID-19 outcomes according to cancer type and cancer treatment and noncancer counterparts, and to determine prognostic factors among COVID-19 patients with cancer. To account for the recognized overrepresentation of severe COVID-19 cases early in the pandemic, evidenced by substantially high hospitalization and death rates in early studies, this study excluded COVID-19 cases before June 2020 in analysis.

Data source
This study uses licensed data from the Optum1. In response to the urgent need to understand the clinical impact of SARS-CoV-2 infection, Optum1 developed a data pipeline with a minimal time lag while preserving as much clinical information as possible. The data is sourced and de-identified from Optum1's longitudinal EHR repository derived from more than 700 hospitals and 7000 clinics in the United States. The COVID-19 dataset incorporates a wide swath of raw clinical information, including new, unmapped COVID-specific clinical data points from both Inpatient and Ambulatory electronic medical records. At the time of our study, the dataset includes patient-level, longitudinal clinical records including demographics, diagnoses, procedures, lab tests, care settings, medications prescribed or administered, and mortality for about 4.2 million unique individuals. The study protocol was reviewed and approved by the Committee for the Protection of Human Subjects (CPHS) at The University of Texas Health Science Center at Houston. Our study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Study population
Patients were included if they had laboratory-confirmed COVID-19 between June 1, 2020, and December 31, 2020, with a record of cancer diagnosis code before COVID-19 or without any evidence of cancer diagnosis in any time point (n = 348 460). To minimize potential changes in available meaningful treatments and mortality from infection with a new variant of SARS--CoV-2, we did not include either earlier infections before June 2020 or later infections on or after December 31, 2020 [18,19]. Positive COVID-19 status was determined by the detection of SARS-CoV-2 in the polymerase chain reaction (PCR) test, and the positivity date was based on the date of sample collection. Patients who were younger than 18 years (n = 34 117) or had missing age (n = 36) or missing sex information (n = 358) were excluded. To allow sufficient observational time to determine cancer status and baseline characteristics, patients with less than one year of medical history in the dataset prior to their COVID-19 tests were excluded (n = 42 365). The final cohort included 271 639 patients, 253 179 patients without any cancer history, and 18 460 patients with any cancer history (Fig 1).

Cancer status and treatment
We determined the cancer status using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) and ICD-10-CM codes indicating any malignancy, including lymphoma and leukemia, except malignant nonmelanoma neoplasm of skin (S1 Table). Although ICD-9-CM codes have been replaced with ICD-10-CM as of October 2015, ICD-9 codes were included to identify cancer and comorbidities as some patient medical records were still reported using ICD-9 codes.
We conducted subgroup analyses limiting to patients with a recent history of cancer within one year before the COVID-19 date by the cancer type in reference to non-cancer patient subgroups including metastatic cancer, solid cancer, hematological cancer and most common 13 cancer types (S1 Table). Specific cancer stage information was not available but we determined metastatic cancer, that is commonly stage IV cancer, using ICD-9-CM and ICD-10-CM codes (S1 Table). Cancer treatments including chemo therapy and radiation therapy were identified using ICD-9-CM, ICD-10-CM, Berenson-Eggers Type of Service (BETOS),  Table) and National Drug Codes (NDCs) [20].

Outcome measures and covariates
Our primary objective was to determine the effect of cancer on the outcomes of COVID-19, specifically hospitalization, intensive care unit (ICU) admission, ventilator use, and all-cause deaths occurring within 30 days of COVID-19 infection, by comparing these outcomes between COVID-19 patients with and without a cancer diagnosis. The secondary objective was to identify factors associated with mortality in COVID-19 patients with recent cancer treatment. Potential predictors included chemotherapy and radiation therapy within 3 months before the COVID-19, age, sex (not included for breast, endometrial, and prostate cancers), race/ethnicity, insurance type, regions, and comorbidity conditions such as chronic conditions identified and known risk factors (S3 Table) [21,22].

Statistical analysis
Descriptive statistics for differences in baseline characteristics and outcome measures between cancer and non-cancer groups were assessed using the chi-square tests for categorical variables and Wilcoxon rank-sum tests for numeric variables. The effect of cancer status on outcomes was measured using the relative risk (RR) from a modified Poisson regression model including age, sex, race/ethnicity, insurance status, cancer treatment, and comorbidities as predictors [23]. To determine the effect of cancer type and stage, we measured RRs between each subgroup of cancer patients and non-cancer patients. Significance levels were set at P < 0.05 for 2-tailed tests and all analyses were performed using STATA 16.0 (StataCorp, College Station, TX).

Characteristics of patients with and without cancer
The final cohort included 271 639 patients with confirmed COVID-19. We identified 18,460 patients with at least one cancer diagnosis and of them: 8034 patients with a history of cancer more than 1 year before COVID-19 and 10 426 patients with a recent cancer diagnosis, defined as cancer diagnosed within one year before COVID-19. Patients with a cancer diagnosis were older (Median age 66 [56-76] in cancer vs 46 [32-60] in non-cancer), more likely to be male (45% vs 43%), White (80% vs 72%), Medicare beneficiaries (33% vs 11%), and have higher prevalences of comorbidity conditions (Table 1). Baseline characteristics of patients among the cancer group were similar between cancer history and recent cancer groups (Table 1).

COVID-19 outcomes with and without cancer
All-cause 30-day mortality was greater than three times in the cancer cohort compared to the non-cancer cohort (6.8% vs 1.9%), and the percentage of COVID-19 patients with cancer who required hospitalization, ICU admission, and ventilator use were more than twice times higher than in patients without cancer (Table 1). After adjusting for age, sex, race/ethnicity, and risk factors, we found that the cancer was associated with 7% increased mortality (RR 1.07, 95% CI 1.01-1.14, P = 0.028) and 4% increased hospitalization (RR 1.04, 95% CI 1.01-1.07, P = 0.006) (Tables 2 and S4). When we compared patients with a history of cancer to those without cancer, mortality and hospitalization were greater but not statistically significant. When we further compared patients with a history of cancer to those with recent cancer, the percentage of COVID-19 patients with recent cancer who required hospitalization, ICU admission, and

Discussion
We found that cancer diagnosis was in general associated with an increased risk for mortality and hospitalization among COVID-19 patients. However, a history of cancer more than one year before COVID-19 diagnosis was not significantly associated with increased mortality or hospitalization. Recent cancer, particularly recent metastatic (stage IV), hematological, liver, and lung cancers were associated with worse COVID-19 outcomes compared to the non-cancer group. Higher mortality rate was associated with active radiation or systemic therapy within 3 months before COVID-19 diagnosis. Older age, black race, Medicare recipients, South geographic region, cardiovascular, diabetes, liver, and renal diseases were also independently associated with increased risks for 30-day mortality of COVID-19.
Our findings were consistent with earlier studies and some of the recent studies [4][5][6][7][8][9][10] on the association between cancer-specific factors and increased mortality. However, we found mortality and hospitalization rates to be significantly lower in both cancer and non-cancer groups compared to those reported in other studies. For example, in a similar study using multi-centered EHR data, Wang et al reported 14.9% death rate among cancer COVID-19 patients and 5.3% among non-cancer patients [10]. Considering the fact that the estimated mortality rates are less than 2% among the general population, such a rate for non-cancer patients appear to be inflated [24]. In contrast, our study found 6.8% mortality in the cancer group and 1.9% mortality in the non-cancer group, closer to reported rates within the general population. Caution must be exercised in treating early pandemic period data, as we did, when (prior June 2020) comprehensive testing was not widely available and treatment strategies were still been identified, which could result in an overrepresentation of severe COVID-19 outcomes.
Contrary to our findings, some recent studies have shown that rates of severe illness from COVID-19 were comparable between those with and without cancer and reported no effect of cancer characteristics on COVID-19 outcomes [11][12][13][14][15]. A study of 928 patients from the United States, Canada, and Spain enrolled in the COVID-19 and Cancer Consortium reported no increased risk of death associated with cancer type or timing of cancer treatment [26]. Another analysis of 423 patients with symptomatic COVID-19 at a New York cancer center found that neither recent receipt of chemotherapy or surgery nor having metastatic cancer were associated with higher risks of complications [27]. These studies have reported higher risks of worse outcomes for patients with a cancer diagnosis overall but adjusting for pre-existing conditions resulted in negation of these worse outcomes [11,12,15]. It is possible that in these studies, the number of cancer patients may have been too small to find statistical significance when adjusting for demographics, geolocation, and multiple comorbidities. Our study had a substantial number of patients in subgroups by cancer types, which allowed us to compare COVID-19 outcomes for each cancer type with matched non-cancer patients. In addition to hematological and lung cancer, which have been previously reported to be associated with worse outcomes, we also found that liver and pancreatic cancers were associated with increased mortality (RR 2.46, 95% CI 1.80-3.36, P<0.001 for liver cancer vs non-cancer, RR 1,94, 95% CI 1.19-3.16, P = 0.008 for pancreatic cancer vs non-cancer).
Our study has several limitations. First, not specifically collected for oncology, the source data had limited resolution for certain cancer characteristics. We relied on ICD-9 and 10 codes to identify cancer patients and had limited cancer staging information except the determination of stage IV cancer (metastatic). Second, we defined the diagnosis of cancer based on the occurrence of diagnosis codes, which could result in false positive classification of cancer diagnosis. However, despite the potential false positive identification of cancer history, we still observed worse outcomes for patients with a cancer diagnosis compared with patients without cancer. To account for potential missing or misclassifications of cancer cases, comorbidities, and cancer treatments, we required a lookback period as an inclusion criterion and used combinations of codes to capture chemotherapy and radiation treatments to minimize misclassification errors.

Conclusion
In this large-scale population study confirmed an increased risk of death and hospitalization among COVID-19 patients with any diagnosis of cancer compared to non-cancer patients. However, individuals with a cancer diagnosis within 1 year of COVID-19 and those who had received cancer treatment within 3 months of COVID-19 were associated with notably worse COVID-19 outcomes. Among cancer patients, a diagnosis of hematological malignancies, lung, liver and pancreatic cancer were independently associated with worse outcomes. Finally, this study reaffirmed the presence of racial disparities in COVID-19 treatment and outcomes among particularly vulnerable populations.